Performance Evaluation of Parallel Sparse Matrix-Vector Products on SGI Altix3700

نویسندگان

  • Hisashi Kotakemori
  • Hidehiko Hasegawa
  • Tamito Kajiyama
  • Akira Nukada
  • Reiji Suda
  • Akira Nishida
چکیده

The present paper discusses scalable implementations of sparse matrix-vector products, which are crucial for high performance solutions of large-scale linear equations, on a cc-NUMA machine SGI Altix3700. Three storage formats for sparse matrices are evaluated, and scalability is attained by implementations considering the page allocation mechanism of the NUMA machine. Influences of the cache/memory bus architectures on the optimum choice of the storage format are examined, and scalable converters between storage formats shown to facilitate exploitation of storage formats of higher performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generic interface for parallel cell-based finite element operator application

We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II. Instead of assembling a sparse matrix and using it for matrix-vector products, the operation is applied by cell-wise quadrature. The evaluation of shape functions is implemented with a sum-factorization approach. Our implementation is parallelized ...

متن کامل

A Shared Memory Parallel Implementation of Block-Circulant Preconditioners

The parallel numerical solution of large scale elliptic boundary value problems is discussed. We analyze the parallel complexity of two block-circulant preconditioners when the conjugate gradient method is used to solve the sparse linear systems arising from such problems. A simple general model of the parallel performance is applied to the considered shared memory parallel architecture. Estima...

متن کامل

Benchmarking Performance of Parallel Computers Using a 2d Elliptic Solver

It was recently shown that block-circulant preconditioners applied to a conjugate gradient method used to solve structured sparse linear systems arising from 2D elliptic problems have very good numerical properties and a potential for good parallel efficiency. The aim of the presentation is to summarize and compare their parallel performance across a number of modern parallel computers: SGI Pow...

متن کامل

Performance Characterization of Matrix Multiplication on SGI Altix 3700

Matrix multiplication is widely used in a variety of applications and is often one of the core components of many scientific computations which includes graph theory, numerical methods, digital control and signal processing. Multiplication of large matrices require a lot of computation time as its complexity is O(n), where n is the dimension of the matrix. A serial algorithm to compute large ma...

متن کامل

Blocked-based sparse matrix-vector multiplication on distributed memory parallel computers

The present paper discusses the implementations of sparse matrix-vector products, which are crucial for high performance solutions of large-scale linear equations, on a PC-Cluster. Three storage formats for sparse matrices compressed row storage, block compressed row storage and sparse block compressed row storage are evaluated. Although using BCRS format reduces the execution time but the impr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005